The Missing Data Assumptions of the Nonequivalent Groups With Anchor Test (NEAT) Design and Their Implications for Test Equating
نویسندگان
چکیده
As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field. To obtain a PDF or a print copy of a report, please visit: Abstract The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each make different assumptions about the missing data in the NEAT design. Though studies have compared the equating performance of the three methods under the NEAT design, none has examined the missing data assumptions and their implications for such comparisons. The missing data assumptions can affect equating studies because it is necessary to fill in the missing data or their distribution in some way in order to have a true, or criterion, equating function to compare the accuracy and bias of the different methods. If the missing data or their distribution are filled in using missing data assumptions that correspond to a given method, that may favor that method in any comparison with the others. This paper first describes the missing data assumptions of the three equating methods and then performs a fair comparison of the 3 methods using data from 3 different operational tests. For each data set, we examine how the 3 equating methods perform when the missing data satisfy the assumptions made by only 1 of these equating methods. The chain equating method is somewhat more satisfactory overall than the other methods in our fair comparison of the methods; hence, we recommend that equating practitioners seriously consider the chain equating method when using the NEAT design. In addition, we conclude that the results from the different equating methods will tend to agree with each other when proper equating conditions are in place. Moreover, to uncover problems that might not reveal themselves otherwise, it is important for operational testing programs to apply multiple equating methods and study the differences among their results. R305U07009. Any opinions expressed in this paper are those of the authors and are not necessarily those of ETS or the IES. The authors thank Moses for their comments and Ayleen Stelhorn for the editorial help.
منابع مشابه
Selection the best Method of Equating Using Anchor-Test Design in Item Response Theory
Explaining the problem. The equating process is used to compare the scores of the two different tests with the same theme. The goal of this research is finding the best method of equating data using Logistic model. Method. we are using the data of Ph.D. test in Statistic major for two consecutive years 92 and 93. For analyzing, we are specifically using the tests of Statistics major ...
متن کامل208-2012: How Test Length and Sample Size Have an Impact on the Standard Errors for IRT True Score Equating: Integrating SAS® and Other Software
The standard error of equating is a useful index to quantify the amount of equating error. It is the standard deviation of equated scores over replications of an equating procedure in samples from a population or populations of examines. The current study estimates the SE of item response theory true score equating in the Nonequivalent Groups with Anchor Test design using simulations. Specifica...
متن کاملContributions to Kernel Equating
Andersson, B. 2014. Contributions to Kernel Equating. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 106. 24 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9089-8. The statistical practice of equating is needed when scores on different versions of the same standardized test are to be compared. This thesis constitutes four contributions...
متن کاملEquating with bivariate log-linear presmoothing under the common-item nonequivalent groups design: structural zeros and their implications
In equating, when common items are internal and scoring is in terms of the number of correct items, some pairs of total scores (X) and common-item scores (V ) can never be observed in a bivariate distribution of X and V ; these pairs are called structural zeros (Bishop, Fienberg, & Holland, 2007; Holland & Wang, 1987). This study examines how different approaches to handling structural zeros gi...
متن کاملIRT Observed-Score Kernel Equating with the R Package kequate
The R package kequate enables observed-score equating using the kernel method of test equating. We present the recent developments of kequate, which provide additional support for item-response theory observed score equating using 2-PL and 3-PL models in the equivalent groups design and non-equivalent groups with anchor test design using chain equating. The implementation also allows for local ...
متن کامل